In the introduction to GPy
you saw how it was possible to build covariance functions with the GPy software. The covariance function contains the assumptions about the data in it. In the Gaussian process, the covariance funciton encodes the model. However, to make predictions, you need to combine the model with data.
If the data we are given, $\mathbf{y}$ is real valued, then the problem is known as a regression problem. If a Gaussian noise model is used, then this is known as Gaussian process regression.
In [1]:
%matplotlib inline
import numpy as np
import pods
import pylab as plt
import GPy
from IPython.display import display
We will now combine the Gaussian process prior with some data to form a GP regression model with GPy. We will generate data from the function $$ f(x) = − \cos(\pi x ) + \sin(4\pi x ) $$ over $[0, 1]$, adding some noise to give $y(x) = f(x) + \epsilon$, with the noise being Gaussian distributed, $\epsilon \sim \mathcal{N}(0, 0.01)$.
In [2]:
X = np.linspace(0.05,0.95,10)[:,None]
Y = -np.cos(np.pi*X) + np.sin(4*np.pi*X) + np.random.normal(loc=0.0, scale=0.1, size=(10,1))
plt.figure()
plt.plot(X,Y,'kx',mew=1.5)
Out[2]:
A GP regression model based on an exponentiated quadratic covariance function can be defined by first defining a covariance function,
In [3]:
k = GPy.kern.RBF(input_dim=1, variance=1., lengthscale=0.2)
And then combining it with the data to form a Gaussian process model,
In [6]:
model = GPy.models.GPRegression(X,Y,k)
Just as for the covariance function object, we can find out about the model using the command display(m)
.
In [7]:
display(model)
Note that by default the model includes some observation noise with variance 1. We can see the posterior mean prediction and visualize the marginal posterior variances using
model.plot()
In [8]:
_ = model.plot()
In [12]:
# Exercise 2 answer here
As we saw when we introduced GPy and covariance functions, random sample paths from the conditional GP can be obtained using
np.random.multivariate_normal(mu[:,0],C)
Now you can sample paths from the posterior process by first obtaining the mean and covariance of the posterior, mu
and C
. These can be obtained from the predict
method,
mu, C, up95, lo95 = model.predict(Xp,full_cov=True)
In [13]:
# Exercise 3